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Regularizing binding energy distributions and thermodynamics of hydration. Theory 
and application to water modeled with classical and ab initio simulations 
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The high-energy tail of the distribution of solute-solvent interaction energies is poorly character- 
ized for condensed systems, but this tail region is of principal interest in determining the excess free 
energy of the solute. We introduce external fields centered on the solute to modulate the short-range 
repulsive interaction between the solute and solvent. This regularizes the binding energy distribution 
and makes it easy to calculate the free energy of the solute with the field. Together with the work 
done to apply the field in the presence and absence of the solute, we calculate the excess chemical 
potential of the solute. We present the formal development of this idea and apply it to study liquid 
water. 

Keywords: potential distribution, hydration free energy, hybrid Monte Carlo, vapor-liquid equilibrium 



The excess chemical potential, /i° x , of a solute (x) 
within a general thermodynamic system (the solvent) is 
that part of the Gibbs free energy that would vanish if 
the interaction between the solute and solvent were to 
vanish. For this reason the excess chemical potential is 
of most interest in understanding a complex system on 
a molecular basis. Since the excess chemical potential 
is measurable, calculating it also serves as an important 
validation of molecular simulations. 

In this communication, we present a new approach to 
calculate /i° x , one that sidesteps the current dominant 
paradigm based on alchemically changing solutes. Our 
approach, a generalization of the quasichemical_organi- 
zation [l| of the potential distribution theorem [2] , rests 
on appreciating and exploiting the different energies with 
which a solute interacts with a solvent at different spatial 
scales. The approach leads to a transparent accounting of 
the hydration thermodynamics and is readily applicable 
to systems modeled by ab initio potentials or molecu- 
larly complex solutes such as proteins. In this commu- 
nication we present our results for liquid water modeled 
by both empirical and ab initio potentials. Studies on a 
protein modeled by empirical potentials and of aqueous 
ions modeled by ab initio potentials will be presented in 
subsequent articles. 

To appreciate the need for alchemical approaches, con- 
sider the formal relation between /x° x and solute-solvent 
interactions 1, 2]: 



/3/4 x = In I P x (e)e? e de = ln<e" e > 



(la) 



In / P^\e)e- 0e de = -ln( e -^) , (lb) 
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where /3 = l/k^T and e = Un+i — Un — U x is the interac- 
tion energy of the solute (x) with the solvent. Un+i is the 
potential energy of the N +1 particle system comprising 
the iV solvent molecules and the one solute molecule. Un 
is the potential energy of the solvent and U x is the poten- 
tial energy of the solute. P x (e) is the density distribution 
of e in a system in which the solute and the solvent are 
thermally coupled, the averaging under these conditions 

is denoted by (...). Pi (e) is the density distribution 
when the solute and the solvent are thermally uncoupled, 
with (. . .)q denoting averaging in this instance. 

For condensed systems the high-e tail of P x (e) or the 

low-e tail of Px (e) are rarely well sampled and thus us- 
ing either of Eqs.Q] usually fails. It is for this reason that 
/U^ x is often calculated by accumulating the work done in 
slowly transforming a solute from a noninteracting solute 
to a fully interacting solute. However, such alchemical 
transformations pose conceptual challenges in the con- 
text of ab initio simulations; an intermediate solute with 
fractional charge (and uncertain spin state) can prove 
troublesome within quantum chemical calculations. For 
example, using ab initio simulations, an alchemical ap- 
proach was used to calculate the hydration free energy of 
some cations, but the same approach could not be used 
for a chloride anion 3]. Thus the broader applicability 
of alchemical approaches within ab initio simulations re- 
mains unknown, and in particular, we are unaware of a 
similar study for the hydration free energy of a water 
molecule. 

In contrast to alchemical transformations, our strategy 
is to regularize the P x (e) distribution by imposing a con- 
straint [4j. Here the constraint is an external field <j>(r) 
centered on the particle such that in the presence of this 
field the solute-solvent interactions are better behaved. 
Using the well-known rule of averages (for example, see 



0), 



(F)u 



3u F), 



(e-^)o 



for any potential u and the mechanical variable F, where 
(. . .)„ denotes averaging when the solvent evolves in the 
presence of the additional potential u, we can show that 



/3/4 x = ln<e-"*) - ln(e- w ) - He~ pe ) 



P4>\ 



-/3e\ 



(2) 



The quantity, ln(e - ^), is the negative of the work done 
to apply <f) in the solute-solvent system; — ln(e _/3 ^)o 
is the work done to apply <f> in the neat solvent sys- 
tem; and — ln(e _ ^ £ )0 is the interaction free energy of 
the uncoupled solute with the solvent in the presence of 
<f). Notice that whereas — ln(e~' 3e )o can prove challeng- 
ing to estimate because of close solute-solvent contacts, 
— ln(e~^ e )0 is expected to be more amenable to a direct 
estimation. Indeed, if <fi is chosen such that the regular- 
ized binding energy distribution P* (s\<f>) is a Gaussian 
of mean (e)^, and variance (Se 2 }^, where Se = e — (e)<f,, 
we have 



ln(e 



-pe\ 



m 



P 



(Se 2 



(3) 



The hrst two terms on the right-hand side of Eq. [2] can 
be obtained using thermodynamic integration, but the 
solute-solvent Hamiltonian is unchanged, an obvious ad- 
vantage in ab initio simulations. For example, if </>(£) 
varies between zero to its hnal value as the parameter £ 
is varied from zero to A, then |5j 



ln(e 



-/30\ 



>/o\* 



d£, 



(4) 



where the ensemble averaging is in the presence of the 
field at the current value of £. 

Eq. [2] is a generalization of the quasichemical ap- 
proach [l| for any external field 0. In the original quasi- 
chemical development [l| , solvent molecules were strictly 
excluded from a sphere of radius A around the solute. In 
the present notation, this amounts to applying the field 



0hO;A) 




(5) 



where r is the distance between the solvent and the so- 
lute, xq = (e~^ h ) is the probability of observing zero 
solvent molecules in a coordination sphere of radius A 



around the solute, 



Po 



-Wt' 



o is the probability of 



observing an empty cavity of radius A in neat water, and 
/Pouter = — m ( e ~ /3£ )0h is the contribution to the free en- 
ergy due to the interaction of the solute with the solvent 
outside the coordination sphere. For solutes that inter- 
act strongly with water or for solutes with complicated 
molecular shape, obtaining xq and po can prove difficult, 
especially for a large cavity radius Q. The flexibility 
to choose any field and the thermodynamic integration 
method (Eq. [U both alleviate this difficulty within the 



context of Eq. [2j In analogy to the original quasichem- 
ical notation, since the fields used here (Eqs. [7]) model 
soft-cavities, we rewrite Eq. [2] as 



/#Mv 



\ux s - \np s + Pul 



(6) 



where the solute (x) is a distinguished water (w) 
molecule. 

In this study we tested two fields 



^ramp 



(r; A) = a ^(r-\) 2 +b 2 - ft) 
b 



</>ij (r; A) = Aa 



-A+ ^2b 



X+^2b 



(7a) 



(7b) 



where a and b are positive constants and r < A; (j>(r; A) = 
for r > A. 

Before presenting the simulation details, we note that a 
variant of the quasichemical approach with a soft-cutoff 
has been presented recently Q- That approach uses a 
probabilistic model to partition solvent between the inner 
shell (r < A) and the bulk and is thus rather different 
from the approach here. 

We next consider the simulation implementation of 

Eq. m 

Classical simulations: Classical simulations were 
performed with the NAMD code [8]. Using the Tel- 
interface in NAMD, the force d<j)/d\ is applied to the 
solvent molecules within A and the ensemble average (in- 
tegrand in Eq.2]) estimated. From the average force- vs-A 
data, the free energies are constructed. 

We simulate a cubic box of 512 SPC/E [9] water 
molecules at a temperature of 298 K using a Langevin 
thermostat and a pressure of 1 bar using a Langevin baro- 
stat [l(|. The decay constant for the thermostat was 1 
ps _1 . The barostat piston period was 200 fs and the de- 
cay time was 100 fs. The SHAKE [11] algorithm was 
used to constrain the geometry of each water molecule. 
The solute water molecule was fixed at the center of the 
simulation cell. 

We sample A between and 5.0 A. First the A = 2.5 A 
state was equilibrated for 2.4 ns and then all other A val- 
ues were generated by successively changing A by ±0.1 A. 
At each A, we performed simulations for a total of 600 ps 
and used the last 500 ps for analysis. 

For calculating /^Ster S i we extend the trajectory for 
states with A between 4.5 A and 5.0 A by an additional 
600 ps, saving configurations every 0.5 ps. For the last 
1100 frames, we inserted a test water molecule in the 
center of the cavity and calculated its binding energy 
with the medium. Five randomly chosen orientations of 
the test water were used per frame. A direct numerical 
estimate of (e~^ e )^ and modeling the binding energy as 
a Gaussian gave nearly the same values for /u™ ter B . 

Ab initio simulations: We simulated water using the 
BLYP-D electron density functional and the Cp2k code 



[12 |. The parameters for the electronic structure calcu- 
lation are as in earlier studies [13], [lj] . In contrast to the 
specification of the external force d(f>/d\ in NAMD, the 
external potential <\> is provided as an input to the CP2k 
code. Since the code calculates the force by differentiat- 
ing an interpolation function of the potential, we use a 
non-zero b parameter in c/> r amp (Eq. I7a[) to ensure forces 
are well-behaved at the cavity boundary. 

We simulate the liquid at a density of 0.997 g/cm 3 
(number density of 33.33 inn" 3 ) and a temperature of 
350 K. The system contains 64 water molecules. To sam- 
ple the NVT ensemble, we use the hybrid Monte Carlo 
method [13j . The initial configuration is taken from a 
previous MD simulation with the same functional [14j . 
During the first 500 sweeps, the time step for integrat- 
ing the equations of motion was adjusted to provide an 
acceptance ratio of 70%. Then the time step was fixed 
and the system equilibrated for another 500 sweeps. (One 
sweep of the HMC comprises 50 molecular dynamics time 
steps.) 

For estimating In x s , a chain of HMC simulation start- 
ing at A = 2.5 A and going up to 3.75 A in increments 
of 0.25 A was performed. During the construction of the 
chain, at each A we performed 300 sweeps. Subsequently, 
for calculating averages, simulations at each A were ex- 
tended for another 1200 sweeps. For estimating \np s , a 
similar strategy is used but with A = 1.75 A to A = 3.75 
A. For both lnx s and lnp s , from the total of 1500 sweeps 
per A, we use the last 1000 for analysis. 

To compute Pouter, si we chose the last 1000 configura- 
tions. We estimate the binding energy distribution us- 
ing the procedure documented earlier [lj], except that 
we perform 16 test particle insertions per configuration. 
Mouter s was estimated using the ensemble average (e^^ 5 )^ 
and by the Gaussian model. 

We next present the results and their discussion. 

Classical Simulations: Figure [1] collects the results 
for the SPC/E water simulation for different choices of 
the external field. 

As expected, the net chemical potential is nearly inde- 
pendent of A or the field, as it obviously must be. 

From the rule of averages, we can show that 



Po(rh)=P.(A)./(fh)-(e^*|r>rh), 



(8) 



where po( r h) is the probability to observe a hard-sphere 
of radius r^, f(r^) is the fraction of configurations where 
a hard-sphere of radius r^ is found in a simulation with an 
external field far; A) and (. . . \r > r^)^ denotes ensemble 
averaging over such configurations. The location of water 
molecules is defined by r (Eq. [7J and A specifies the range 
up to which the field is active. (Here A > rv) Figure [5] 
shows the results of such a calculation, with r^ chosen 
such that f(rh) is around 0.25 for a given A. As the 
figure shows, using either field (Eqs. [7J we predict the 
same value of po( r h) within statistical uncertainties of 
about 0.2 kcal/mol. 

Fig. [U reveals that for A > 3.0 A, the calculated po is 
systematically slightly larger (the deviation in — fce^lnpo 
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FIG. 1. Various components of the free energy obtained using 
Eq.|B](w = H 2 0) for the SPC/E water model. k B T\nx s /p s 
smoothly approaches zero (0) as A — > 0. Open symbols: 
fa ( E q- E3 with a = 0.155 kcal/mol and b = 3.1655 A, 
the LJ energy and diameter parameters for SPC/E wa- 
ter. Filled symbols: cA ram p (Eq. I7a[) with 6 = and a — 
5.0 kcal/mol. The average value of /i^ over A is shown by 
the solid blue (</> r am P ) and dashed blue (fa) lines; /i™ w 
—7.0 kcal/mol is in excellent agreement with estimates based 
on histogram overlap (—6.85 kcal/mol, Ref. uw and from the 
equilibrium vapor and liquid densities ( — 7.0 kcal/mol, Ref. 
18). At A = 5.0 A, the statistical uncertainty in k B T\n(x s /ps) 
is 0.1 kcal/mol. The uncertainty in Pouter, a can be ignored. 
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FIG. 2. The free energy to create empty soft-and hard- 
cavities of radius A in neat water. Red lines : soft cav- 
ities. Filled symbols: hard-cavity from ciramp (Eq. I7a[> . 
Open Symbols: hard-cavity from fa (Eq. I7b|) . Parameters 
for the ci's are noted in Fig. [T] The inset shows the deviation 
of the calculated — k B T\npo from the revised scaled particle 
theory [l5j] prediction (solid black line, main figure). 



is negative) relative to the revised scaled particle theory 



TABLE I. The different contributions to the excess chemical 
potential of water obtained from the ab initio HMC simula- 
tions, ntuter.s is calculated as an ensemble average; the result 
from a Gaussian fit (Fig. [3]) to the binding energy data is given 
in parenthesis. Eq. [7a] with b = 0.001 A and a — 10 kcal/mol 
is used to regularize the interactions. The first three rows 
are using Eq. [7a] and the next three rows are with Eq. I7bl 
Energies are in kcal/mol. Uncertainties are at la level. 
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(SPT) [l5j predictions. This small difference is likely due 
to a lower bulk surface tension predicted by the SPC/E 
model [16J . whereas the SPT predictions are based on 
the experimental bulk surface tension. Indeed, using the 
surface tension of SPC/E, we are able to reparametrize 
SPT to reproduce the po( r /i) obtained in this study. 

Ab initio Simulations: Table Q] collects the esti- 
mated [i%? of water. Using 0, amp (Eq. [Taj), the aver- 
age of n^f over the A values considered here is about 
— 6.9 ± 0.4 kcal/mol). Earlier, using a hard-sphere condi- 
tioning and also an order of magnitude more data — 110k 
energy values |14| versus 16k energy values used here — 
for calculating (j™ tcr s we found /x™ sa — 6.0 kcal/mol for 
a 64- water molecule system at an average temperature of 
362 K [14]. The present estimate appears in fair agree- 
ment with the earlier result, especially considering the 
difference in temperatures and also the amount of data 
collected. 

The average (j%f using </> L j (Eq. [7EJ is —5.0 ± 0.4. 
The estimates using </> ram p (Eq. [7a|l and </>lj (Eq. I7bl) 
thus bound the earlier estimate of —6.0 kcal/mol [14j . 
but in contrast to results with the classical potential 
model (Fig. [1} , the variation in the numerical estimate 
of /x|f with both A and the field is high (being about 
1-2 kcal/mol). Comparing fceTlnx s and kBTlnp s be- 
tween r amp and </>ij (Table GJ suggests that the (large) 
step-size of 0.25 A for integration underlies the observed 
variation in /i™. Identifying optimal integration strate- 
gies, perhaps including non- uniform sampling of A, is left 
for future work. 

Figure [3] compares the binding energy of the test par- 
ticle with the bulk medium for different choices of the 
field. The value of /J,^ teI B calculated using a hard-sphere 
conditioning of radius 3.0 A (see [Tj]) is similar to that 
for conditioning with </> ra mp (A = 3.25 A); not surpris- 
ingly, the low-e tail is also similar for 0h (A = 3.0 A) and 

</>ramp (A = 3.3 A). 

In summary, we have presented a method of obtaining 
the hydration thermodynamics of the solute that avoids 
alchemical transformation of the solute. This method 



FIG. 3. Distribution of the interaction energies P^°' (e\4>) 
of the test particle in the presence of different regularizing 
fields <f>. The binding energies are translated vertically for 
clarity. Triangles (red) and circles (blue): 0h is used; data 
has been taken from our earlier study [141] . Squares (black): 

</>ram P with a — 10 kcal/mol and b = 0.001 A. The solid lines 
are Gaussian fit to the respective distribution. For the radii 
range considered here, regularizing with Eq. l7al leads to a less 
well-characterized low-e tail relative to Eq. [5] as is expected. 



allows one to readily calculate the hydration thermody- 
namics of solutes using ab initio simulations. Within the 
quasichemical approach, using a hard-sphere cutoff to de- 
marcate local and long-range interactions restricts one to 
small (typically < 4.0 A) cavity sizes, since these are the 
cavities that can be observed in simulations with suffi- 
cient statistical precision. The present development re- 
moves this limitation, and it is now possible to probe cav- 
ity sizes that can accommodate small globular proteins 
(Weber and Asthagiri, in preparation). Thus exploring 
the hydration thermodynamics of proteins in aqueous 
media with the conceptual clarity afforded by the quasi- 
chemical approach appears to be within reach. Finally, 
the perspective the regularization approach provides on 
theories of liquids also remains to be explored. 
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